setwd("/Users/gustavoacosta/Desktop/5 semestre/intro econometrics/datasets1")
dir()
## [1] "~$updated_cocasales_data.xlsx" "updated_cocasales_data.xlsx"
cocacolasales_new <- read_excel("updated_cocasales_data.xlsx")
cocacolasales_new2 <- read_excel("updated_cocasales_data.xlsx", sheet = "tseries3 - GDL" )

1 Introduction

How time series Analysis is useful to forecast an outcome?

Within this second evidence we will be employing Time Series Analysis, a useful tool for forecasting,in what way?, well it is use to forecast because it shows how the data changes over lapse of time as well we can identify in which direction the data is changing and observe trends from the collected data from specific unit of analysis over period of time.

2 Background

Our Problem Situation is based on Coca-Cola Femsa , a multinational Mexican enterprise that takes part of the beverages industry,currently they are one of the biggest bottling companies in Latin America, they offer their services in 10 Latin countries and the Philippines. According to their Financial reports from 2015 to 2018, shows that they’ve had the biggest quantity of sales from unit boxes during March and May 2018, and the lowest in Jan-Mar 2017, during the last year of the financial report the sales haven’t been so consistent.

3 Description of the problem situation

In this evidence our main objective is to analyze how the seasonal phenomena affects the sales of unit sales boxes and how does it respond to this seasons such as summer and winter, etc. We will create regression models and time series based on the behavior of certain components and select a predictive model in order to help us to estimate sales taking in consideration the different components of a time series data.

4 Data and Methodology

Dependent Variable:

  • Sales Unit boxes

Independent Variables: - Date (time)

4.1 Data Sources and study period

The data set we will be using is about coca cola femsa through the years 2015- 2018 but is divided monthly, where we have our dependent variable, that we take in special consideration for forecasting, unit sales of boxes, we also have other independent variables we can analyze seasonally like the weather.

4.2 Exploratory Data Analysis

cocacolasales_new$date=as.yearmon(cocacolasales_new$date,format="%Y/%m")

Here we have a quick overview from the data set, where we will find the descriptive statistics from the data set such as mean, mean absolute difference of the variables, datatypes and nulls.

basic_eda <- function(data)
{
  glimpse(data)
  df_status(data)
  freq(data) 
  profiling_num(data)
  plot_num(data)
  describe(data)
}
basic_eda(cocacolasales_new)
## Rows: 48
## Columns: 2
## $ date               <yearmon> Jan 2015, Feb 2015, Mar 2015, Apr 2015, May 201…
## $ ccsales_unit_boxes <dbl> 5516689, 5387496, 5886747, 6389182, 6448275, 669794…
##             variable q_zeros p_zeros q_na p_na q_inf p_inf    type unique
## 1               date       0       0    0    0     0     0 yearmon     48
## 2 ccsales_unit_boxes       0       0    0    0     0     0 numeric     48
## Warning in freq(data): None of the input variables are factor nor character
## Warning: attributes are not identical across measure variables; they will be
## dropped
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

## data 
## 
##  2  Variables      48  Observations
## --------------------------------------------------------------------------------
## date 
##        n  missing distinct 
##       48        0       48 
## 
## lowest : Jan 2015 Feb 2015 Mar 2015 Apr 2015 May 2015
## highest: Aug 2018 Sep 2018 Oct 2018 Nov 2018 Dec 2018
## --------------------------------------------------------------------------------
## ccsales_unit_boxes 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       48        1  6473691   680321  5491459  5576844 
##      .25      .50      .75      .90      .95 
##  6171767  6461357  6819782  7288957  7396022 
## 
## lowest : 5301755 5387496 5477874 5516689 5568552
## highest: 7330137 7345037 7423475 7457473 7963063
## --------------------------------------------------------------------------------

Then, we will see some visualizations as plots for the first sheet of our data set which contains, two variables sales unit boxes and the date monthly from 2015-2018

Plot 1

In this plot we can see the Sales Unit boxes from Coca Cola and we observe a path that changes over time, we observe some decreasing peaks in several years that affects the constant behavior of the unit sales boxes in all four years from the time series data.

plot(cocacolasales_new$date,cocacolasales_new$ccsales_unit_boxes, type="l",col="blue", lwd=2, xlab ="Date",ylab ="Sales", main = "Coca Cola Femsa Sales Unit")

Alternative TS Plot 2

Here is an alternative plot to analyze the behavior of the unit sales boxes. The difference form this plot and the last one is we can see more specifically the dates by months an identify more easily identify the low peaks that impact on the sales for example around February 2016,2017 and 2018

plot1xts<-xts(cocacolasales_new$ccsales_unit_boxes,order.by=cocacolasales_new$date)
plot(plot1xts)

4.3 2 Set of Visualizations

This second set of visualizations is for the data set we will be using for the model VAR, here we have a quick overview of the variables we have and sum descriptive statistics for each column.

Dependent Variable:

  • Sales Unit boxes

Independent Variables:

  • consumer_sentiment
  • inflation_rate
  • gdp_percapita
  • itaee
  • pop_density
  • job_density
  • max_temperature
  • holiday_month
  • unemp_rate
basic_eda <- function(data)
{
  glimpse(data)
  df_status(data)
  freq(data) 
  profiling_num(data)
  plot_num(data)
  describe(data)
}
basic_eda(cocacolasales_new2)
## Rows: 48
## Columns: 15
## $ date               <chr> "2015/01", "2015/02", "2015/03", "2015/04", "2015/0…
## $ sales_unitboxes    <dbl> 5516689, 5387496, 5886747, 6389182, 6448275, 669794…
## $ consumer_sentiment <dbl> 38.06250, 37.49114, 38.50522, 37.84286, 38.03169, 3…
## $ CPI                <dbl> 87.11010, 87.27538, 87.63072, 87.40384, 86.96737, 8…
## $ inflation_rate     <dbl> -0.09, 0.19, 0.41, -0.26, -0.50, 0.17, 0.15, 0.21, …
## $ unemp_rate         <dbl> 0.05230256, 0.05311320, 0.04608844, 0.05102038, 0.0…
## $ gdp_percapita      <dbl> 11659.56, 11659.55, 11659.55, 11625.75, 11625.74, 1…
## $ itaee              <dbl> 103.7654, 103.7654, 103.7654, 107.7518, 107.7518, 1…
## $ itaee_growth       <dbl> 0.049716574, 0.049716574, 0.049716574, 0.031838981,…
## $ pop_density        <dbl> 98.54185, 98.54186, 98.54187, 98.82843, 98.82844, 9…
## $ job_density        <dbl> 18.26048, 18.46329, 18.64164, 18.67876, 18.67539, 1…
## $ pop_minwage        <dbl> 9.657861, 9.657861, 9.657861, 9.594919, 9.594919, 9…
## $ exchange_rate      <dbl> 14.69259, 14.92134, 15.22834, 15.22618, 15.26447, 1…
## $ max_temperature    <dbl> 28, 31, 29, 32, 34, 32, 29, 29, 29, 29, 29, 26, 28,…
## $ holiday_month      <dbl> 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, …
##              variable q_zeros p_zeros q_na p_na q_inf p_inf      type unique
## 1                date       0       0    0    0     0     0 character     48
## 2     sales_unitboxes       0       0    0    0     0     0   numeric     48
## 3  consumer_sentiment       0       0    0    0     0     0   numeric     48
## 4                 CPI       0       0    0    0     0     0   numeric     48
## 5      inflation_rate       0       0    0    0     0     0   numeric     41
## 6          unemp_rate       0       0    0    0     0     0   numeric     48
## 7       gdp_percapita       0       0    0    0     0     0   numeric     48
## 8               itaee       0       0    0    0     0     0   numeric     16
## 9        itaee_growth       0       0    0    0     0     0   numeric     16
## 10        pop_density       0       0    0    0     0     0   numeric     48
## 11        job_density       0       0    0    0     0     0   numeric     45
## 12        pop_minwage       0       0    0    0     0     0   numeric     16
## 13      exchange_rate       0       0    0    0     0     0   numeric     48
## 14    max_temperature       0       0    0    0     0     0   numeric     12
## 15      holiday_month      36      75    0    0     0     0   numeric      2
## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

## Warning: `guides(<scale> = FALSE)` is deprecated. Please use `guides(<scale> =
## "none")` instead.

## data 
## 
##  15  Variables      48  Observations
## --------------------------------------------------------------------------------
## date 
##        n  missing distinct 
##       48        0       48 
## 
## lowest : 2015/01 2015/02 2015/03 2015/04 2015/05
## highest: 2018/08 2018/09 2018/10 2018/11 2018/12
## --------------------------------------------------------------------------------
## sales_unitboxes 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       48        1  6473691   680321  5491459  5576844 
##      .25      .50      .75      .90      .95 
##  6171767  6461357  6819782  7288957  7396022 
## 
## lowest : 5301755 5387496 5477874 5516689 5568552
## highest: 7330137 7345037 7423475 7457473 7963063
## --------------------------------------------------------------------------------
## consumer_sentiment 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       48        1    37.15    3.041    33.93    34.63 
##      .25      .50      .75      .90      .95 
##    35.64    36.76    38.14    41.81    42.84 
## 
## lowest : 28.66787 31.51561 33.79513 34.18934 34.33673
## highest: 42.13270 42.53301 43.00569 43.34109 44.86544
## --------------------------------------------------------------------------------
## CPI 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       48        1     93.4    5.811    87.16    87.37 
##      .25      .50      .75      .90      .95 
##    89.18    92.82    98.40   100.08   101.26 
## 
## lowest :  86.96737  87.11010  87.11311  87.24082  87.27538
## highest: 100.49200 100.91700 101.44000 102.30300 103.02000
## --------------------------------------------------------------------------------
## inflation_rate 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       41    0.999   0.3485   0.4164  -0.3330  -0.1900 
##      .25      .50      .75      .90      .95 
##   0.1650   0.3850   0.5575   0.6510   0.8255 
## 
## lowest : -0.50 -0.45 -0.34 -0.32 -0.26, highest:  0.70  0.78  0.85  1.03  1.70
## --------------------------------------------------------------------------------
## unemp_rate 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       48        1  0.04442 0.006762  0.03648  0.03747 
##      .25      .50      .75      .90      .95 
##  0.04010  0.04369  0.04897  0.05373  0.05413 
## 
## lowest : 0.03466221 0.03587220 0.03641392 0.03659655 0.03677829
## highest: 0.05383592 0.05394831 0.05423057 0.05473379 0.05517447
## --------------------------------------------------------------------------------
## gdp_percapita 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       48        1    11979    287.3    11570    11592 
##      .25      .50      .75      .90      .95 
##    11830    12014    12162    12297    12318 
## 
## lowest : 11558.59 11558.59 11558.59 11591.89 11591.89
## highest: 12296.98 12296.98 12329.04 12329.04 12329.05
##                                                                             
## Value      11558 11592 11626 11660 11886 11920 11954 11988 12040 12072 12106
## Frequency      3     3     3     3     3     3     3     3     3     3     3
## Proportion 0.062 0.062 0.062 0.062 0.062 0.062 0.062 0.062 0.062 0.062 0.062
##                                         
## Value      12138 12232 12264 12296 12330
## Frequency      3     3     3     3     3
## Proportion 0.062 0.062 0.062 0.062 0.062
## 
## For the frequency table, variable is rounded to the nearest 2
## --------------------------------------------------------------------------------
## itaee 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       16    0.997    113.9    5.423    105.2    107.8 
##      .25      .50      .75      .90      .95 
##    111.5    113.5    117.1    119.8    121.5 
## 
## lowest : 103.7654 107.7518 108.7077 110.5957 111.7800
## highest: 117.0615 117.3254 118.9366 119.7875 122.4821
## 
## 103.7653537 (3, 0.062), 107.7518353 (3, 0.062), 108.7076578 (3, 0.062),
## 110.5956589 (3, 0.062), 111.779976 (3, 0.062), 111.7936456 (3, 0.062),
## 112.6669197 (3, 0.062), 113.2336441 (3, 0.062), 113.7050866 (3, 0.062),
## 115.672343 (3, 0.062), 116.373846 (3, 0.062), 117.0614999 (3, 0.062),
## 117.325411 (3, 0.062), 118.9365563 (3, 0.062), 119.7875356 (3, 0.062),
## 122.4821017 (3, 0.062)
## --------------------------------------------------------------------------------
## itaee_growth 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       16    0.997  0.03174   0.0167 0.006355 0.007811 
##      .25      .50      .75      .90      .95 
## 0.022376 0.029977 0.043038 0.049717 0.054149 
## 
## lowest : 0.005570848 0.007811483 0.021536875 0.022021360 0.022494545
## highest: 0.041634475 0.047249285 0.047629618 0.049716574 0.056536274
## 
## 0.005570848 (3, 0.062), 0.007811483 (3, 0.062), 0.021536875 (3, 0.062),
## 0.02202136 (3, 0.062), 0.022494545 (3, 0.062), 0.02328721 (3, 0.062),
## 0.023470888 (3, 0.062), 0.028115278 (3, 0.062), 0.031838981 (3, 0.062),
## 0.037510361 (3, 0.062), 0.041347463 (3, 0.062), 0.041634475 (3, 0.062),
## 0.047249285 (3, 0.062), 0.047629618 (3, 0.062), 0.049716574 (3, 0.062),
## 0.056536274 (3, 0.062)
## --------------------------------------------------------------------------------
## pop_density 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       48        1    100.6    1.502    98.64    98.83 
##      .25      .50      .75      .90      .95 
##    99.61   100.67   101.69   102.43   102.60 
## 
## lowest :  98.54185  98.54186  98.54187  98.82843  98.82844
## highest: 102.42910 102.42912 102.69447 102.69449 102.69450
## --------------------------------------------------------------------------------
## job_density 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       45        1    20.38    1.455    18.64    18.68 
##      .25      .50      .75      .90      .95 
##    19.28    20.39    21.60    21.93    22.10 
## 
## lowest : 18.26048 18.46329 18.64164 18.64668 18.67539
## highest: 21.97487 22.03936 22.13799 22.24837 22.36215
## --------------------------------------------------------------------------------
## pop_minwage 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       16    0.997    11.12    1.114    9.467    9.595 
##      .25      .50      .75      .90      .95 
##   10.794   11.139   11.413   12.722   12.920 
## 
## lowest :  9.398393  9.594919  9.657861 10.675655 10.833710
## highest: 11.327926 11.669528 12.297004 12.721926 13.026305
## 
## 9.398392752 (3, 0.062), 9.594918702 (3, 0.062), 9.657860913 (3, 0.062),
## 10.67565544 (3, 0.062), 10.83370977 (3, 0.062), 10.88170258 (3, 0.062),
## 10.94476958 (3, 0.062), 11.04232751 (3, 0.062), 11.23630782 (3, 0.062),
## 11.24085004 (3, 0.062), 11.30092217 (3, 0.062), 11.32792593 (3, 0.062),
## 11.66952843 (3, 0.062), 12.29700388 (3, 0.062), 12.7219262 (3, 0.062),
## 13.02630495 (3, 0.062)
## --------------------------------------------------------------------------------
## exchange_rate 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       48        1    18.18    1.797    15.23    15.42 
##      .25      .50      .75      .90      .95 
##    17.38    18.62    19.06    20.16    20.30 
## 
## lowest : 14.69259 14.92134 15.22618 15.22834 15.26447
## highest: 20.26117 20.29054 20.30320 20.52058 21.38527
## --------------------------------------------------------------------------------
## max_temperature 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       48        0       12    0.974     30.5    2.961    27.00    28.00 
##      .25      .50      .75      .90      .95 
##    29.00    30.00    32.25    34.30    35.00 
## 
## lowest : 26 27 28 29 30, highest: 33 34 35 36 37
##                                                                             
## Value         26    27    28    29    30    31    32    33    34    35    36
## Frequency      2     2     6    13     4     5     4     6     1     3     1
## Proportion 0.042 0.042 0.125 0.271 0.083 0.104 0.083 0.125 0.021 0.062 0.021
##                 
## Value         37
## Frequency      1
## Proportion 0.021
## --------------------------------------------------------------------------------
## holiday_month 
##        n  missing distinct     Info      Sum     Mean      Gmd 
##       48        0        2    0.563       12     0.25    0.383 
## 
## --------------------------------------------------------------------------------
summary(cocacolasales_new2)
##      date           sales_unitboxes   consumer_sentiment      CPI        
##  Length:48          Min.   :5301755   Min.   :28.67      Min.   : 86.97  
##  Class :character   1st Qu.:6171767   1st Qu.:35.64      1st Qu.: 89.18  
##  Mode  :character   Median :6461357   Median :36.76      Median : 92.82  
##                     Mean   :6473691   Mean   :37.15      Mean   : 93.40  
##                     3rd Qu.:6819782   3rd Qu.:38.14      3rd Qu.: 98.40  
##                     Max.   :7963063   Max.   :44.87      Max.   :103.02  
##  inflation_rate      unemp_rate      gdp_percapita       itaee      
##  Min.   :-0.5000   Min.   :0.03466   Min.   :11559   Min.   :103.8  
##  1st Qu.: 0.1650   1st Qu.:0.04010   1st Qu.:11830   1st Qu.:111.5  
##  Median : 0.3850   Median :0.04369   Median :12014   Median :113.5  
##  Mean   : 0.3485   Mean   :0.04442   Mean   :11979   Mean   :113.9  
##  3rd Qu.: 0.5575   3rd Qu.:0.04897   3rd Qu.:12162   3rd Qu.:117.1  
##  Max.   : 1.7000   Max.   :0.05517   Max.   :12329   Max.   :122.5  
##   itaee_growth       pop_density      job_density     pop_minwage    
##  Min.   :0.005571   Min.   : 98.54   Min.   :18.26   Min.   : 9.398  
##  1st Qu.:0.022376   1st Qu.: 99.61   1st Qu.:19.28   1st Qu.:10.794  
##  Median :0.029977   Median :100.67   Median :20.39   Median :11.139  
##  Mean   :0.031736   Mean   :100.65   Mean   :20.38   Mean   :11.116  
##  3rd Qu.:0.043038   3rd Qu.:101.69   3rd Qu.:21.60   3rd Qu.:11.413  
##  Max.   :0.056536   Max.   :102.69   Max.   :22.36   Max.   :13.026  
##  exchange_rate   max_temperature holiday_month 
##  Min.   :14.69   Min.   :26.00   Min.   :0.00  
##  1st Qu.:17.38   1st Qu.:29.00   1st Qu.:0.00  
##  Median :18.62   Median :30.00   Median :0.00  
##  Mean   :18.18   Mean   :30.50   Mean   :0.25  
##  3rd Qu.:19.06   3rd Qu.:32.25   3rd Qu.:0.25  
##  Max.   :21.39   Max.   :37.00   Max.   :1.00

5 Result Analysis

5.1 Stationary and Non- Stationary

The ADF test is a test of stationary properties in the time series data Based on some statistical package estimates the ARIMA, SARIMA and ARIMAX models.

  • H0: The series data are Non-stationary (p-value > 0.05)
  • HA: The time series data are stationary (p-value < 0.05)

Stationary Test:

Our result shows that for the p-value we have is smaller than 0.05 therefore we can reject our null hypothesis and conclude our series data is stationary. This means the statistical properties of a process generating a time series do not change over time, in other words it does not mean the data doesn’t changes but the way it changes does not itself change over time. And the mean stays constant over the period of time.

adf.test(cocacolasales_new$ccsales_unit_boxes) 
## Warning in adf.test(cocacolasales_new$ccsales_unit_boxes): p-value smaller than
## printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  cocacolasales_new$ccsales_unit_boxes
## Dickey-Fuller = -4.4282, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

Acf plots In our plot we see there is autocorrelation around the lags 1, 12 and 16.This shows how the time series is correlated with itself

acf(cocacolasales_new$ccsales_unit_boxes,main="Significant Autocorrelations")

Decompose a time series

Here we have a decompose of the time series, we can take notice:

  • The first plot we see is “observed”, which is very similar to our first visualization on this r script, that shows a pattern and a slight trend.

  • The first component is trend, which shows a positive linear behavior of the time series and it tends to increase seasonally

  • The second component represents the seasonality, the repeating patters over time, here we see a pattern where it increases a certain period of time and then there’s a low peak.

  • The third component is random, that shows the variability that can’t be explained in the time series, we see random fluctuations not so constant over time, there are some randoms and down peaks around 2017.

arcaipcts<-ts(cocacolasales_new$ccsales_unit_boxes,frequency=12,start=c(2015,1))
arcapcdec<-decompose(arcaipcts)
plot(arcapcdec)

5.2 Regression Model Specification

5.2.1 MODEL 1: Fitting the ARMA(1,1)

The auto regressive component and moving average are statistically significant

setwd("/Users/gustavoacosta/Desktop/5 semestre/intro econometrics/datasets1")
coca1 <- read_excel("updated_cocasales_data.xlsx")
  • AIC : 1400.62
  • ar1 is statistically significant
  • In our plot we see there is no autocorrelation among the lags, except for lag 20 and 25
  • There is autocorrelation
  • Box-Ljung test: p-value = 0.9991
summary(ARMA.mydata<-arma(coca1$ccsales_unit_boxes, order=c(1,1)))
## Warning in arma(coca1$ccsales_unit_boxes, order = c(1, 1)): Hessian negative-
## semidefinite
## Warning in sqrt(diag(object$vcov)): NaNs produced

## Warning in sqrt(diag(object$vcov)): NaNs produced
## 
## Call:
## arma(x = coca1$ccsales_unit_boxes, order = c(1, 1))
## 
## Model:
## ARMA(1,1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1146267  -335947    44609   276851  1124630 
## 
## Coefficient(s):
##            Estimate  Std. Error  t value Pr(>|t|)    
## ar1       5.271e-01   1.059e-02   49.770   <2e-16 ***
## ma1       4.901e-03   1.422e-01    0.034    0.972    
## intercept 3.083e+06          NA       NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Fit:
## sigma^2 estimated as 2.431e+11,  Conditional Sum-of-Squares = 1.118125e+13,  AIC = 1400.62
plot(ARMA.mydata)

ARMA.residuals<-(ARMA.mydata$residuals)
ARMA.residuals<-na.omit(ARMA.residuals) 
acf(ARMA.residuals,main="ACF - ARMA (1,1)")

Box.test(ARMA.residuals,lag=1,type="Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  ARMA.residuals
## X-squared = 1.2684e-06, df = 1, p-value = 0.9991

5.2.2 MODEL 2: ARIMA (1,1,1)

setwd("/Users/gustavoacosta/Desktop/5 semestre/intro econometrics/datasets1")
coca2 <- read_excel("updated_cocasales_data.xlsx")
  • AIC : -97.31
  • ar1 and ma1 are not statistically significant
  • There is autocorrelation in lag 4, 12 and 16
  • Box-Ljung test: p-value = 0.8713
  • ADF-test: p-value = 0.01
ARIMA.mydatar<-arima(log(coca2$ccsales_unit_boxes), order=c(1,1,1))
print(ARIMA.mydatar)
## 
## Call:
## arima(x = log(coca2$ccsales_unit_boxes), order = c(1, 1, 1))
## 
## Coefficients:
##          ar1      ma1
##       0.5737  -0.9791
## s.e.  0.1396   0.1263
## 
## sigma^2 estimated as 0.006255:  log likelihood = 51.65,  aic = -97.31
acf(ARIMA.mydatar$residuals,main="ACF - ARIMA (1,0.5,1)")

Box.test(ARIMA.mydatar$residuals,lag=1,type="Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  ARIMA.mydatar$residuals
## X-squared = 0.026233, df = 1, p-value = 0.8713
adf.test(ARIMA.mydatar$residual)
## Warning in adf.test(ARIMA.mydatar$residual): p-value smaller than printed p-
## value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ARIMA.mydatar$residual
## Dickey-Fuller = -4.4059, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

6 VAR Estimation

setwd("/Users/gustavoacosta/Desktop/5 semestre/intro econometrics/datasets1")
coca3 <- read_excel("updated_cocasales_data.xlsx", sheet = "tseries3 - GDL" )
coca3$date=as.Date(as.yearmon(coca3$date,format="%Y/%m"))

6.1 converting to time series format

consumer_sentiment<-ts(coca3$consumer_sentiment,start=c(2015,1),end=c(2018,12),frequency=12)
CPI<-ts(coca3$CPI,start=c(2015,1),end=c(2018,12),frequency=12)
inflation_rate<-ts(coca3$inflation_rate,start=c(2015,1),end=c(2018,12),frequency=12)
unemp_rate<-ts(coca3$unemp_rate,start=c(2015,1),end=c(2018,12),frequency=12)
gdp_percapita<-ts(coca3$gdp_percapita,start=c(2015,1),end=c(2018,12),frequency=12)
itaee<-ts(coca3$itaee,start=c(2015,1),end=c(2018,12),frequency=12)
itaee_growth<-ts(coca3$itaee_growth,start=c(2015,1),end=c(2018,12),frequency=12)
pop_density<-ts(coca3$pop_density,start=c(2015,1),end=c(2018,12),frequency=12)
job_density<-ts(coca3$job_density,start=c(2015,1),end=c(2018,12),frequency=12)
pop_minwage<-ts(coca3$pop_minwage,start=c(2015,1),end=c(2018,12),frequency=12)
exchange_rate<-ts(coca3$exchange_rate,start=c(2015,1),end=c(2018,12),frequency=12)
max_temperature<-ts(coca3$max_temperature,start=c(2015,1),end=c(2018,12),frequency=12)
holiday_month<-ts(coca3$holiday_month,start=c(2015,1),end=c(2018,12),frequency=12)
sales_unitboxes<-ts(coca3$sales_unitboxes,start=c(2015,1),end=c(2018,12),frequency=12)

6.2 plotting time series data

Here are the plots from our independent variables with the time series data

par(mfrow=c(3,3))
plot(coca3$date,coca3$consumer_sentiment,type="l",col="blue",lwd=2,xlab="Date",ylab="consumer_sentiment",main="consumer_sentiment")
plot(coca3$date,coca3$CPI,type="l",col="blue",lwd=2,xlab="Date",ylab="CPI",main="CPI Rate")
plot(coca3$date,coca3$inflation_rate,type="l",col="blue",lwd=2,xlab="Date",ylab="Inflation",main="Inflation Rate")
plot(coca3$date,coca3$unemp_rate,type="l",col="blue",lwd=2,xlab="Date",ylab="unemp_rate",main="unemp_rate")
plot(coca3$date,coca3$gdp_percapita,type="l",col="blue",lwd=2,xlab="Date",ylab="gdp_percapita",main="gdp_percapita")
plot(coca3$date,coca3$itaee,type="l",col="blue",lwd=2,xlab="Date",ylab="itaee",main="itaee")
plot(coca3$date,coca3$gdp_percapita,type="l",col="blue",lwd=2,xlab="Date",ylab="gdp_percapita",main="gdp_percapita")
plot(coca3$date,coca3$itaee_growth,type="l",col="blue",lwd=2,xlab="Date",ylab="itaee_growth",main="itaee_growth")
plot(coca3$date,coca3$pop_density,type="l",col="blue",lwd=2,xlab="Date",ylab="pop_density",main="pop_density")

plot(coca3$date,coca3$job_density,type="l",col="blue",lwd=2,xlab="Date",ylab="job_density",main="job_density")
plot(coca3$date,coca3$pop_minwage,type="l",col="blue",lwd=2,xlab="Date",ylab="pop_minwage",main="pop_minwage")
plot(coca3$date,coca3$exchange_rate,type="l",col="blue",lwd=2,xlab="Date",ylab="exchange_rate",main="exchange_rate")
plot(coca3$date,coca3$max_temperature,type="l",col="blue",lwd=2,xlab="Date",ylab="max_temperature",main="max_temperature")
plot(coca3$date,coca3$holiday_month,type="l",col="blue",lwd=2,xlab="Date",ylab="holiday_month",main="holiday_month")
plot(coca3$date,coca3$sales_unitboxes,type="l",col="blue",lwd=2,xlab="Date",ylab="sales_unitboxes",main="sales_unitboxes")

Here we have another format for our time series plot but with mostly the same information with our independent variables

ts_plot(consumer_sentiment)
ts_plot(CPI)
ts_plot(inflation_rate)
ts_plot(unemp_rate)
ts_plot(gdp_percapita)
ts_plot(itaee)
ts_plot(itaee_growth)
ts_plot(pop_density)
ts_plot(job_density)
ts_plot(pop_minwage)
ts_plot(exchange_rate)
ts_plot(max_temperature)
ts_plot(holiday_month)
ts_plot(sales_unitboxes)
adf.test(coca3$consumer_sentiment)# non-stationary (p-value > 0.05)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$consumer_sentiment
## Dickey-Fuller = -0.70142, Lag order = 3, p-value = 0.9638
## alternative hypothesis: stationary
adf.test(coca3$CPI) # non-stationary (p-value > 0.05)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$CPI
## Dickey-Fuller = -2.4733, Lag order = 3, p-value = 0.3851
## alternative hypothesis: stationary
adf.test(coca3$inflation_rate)# non-stationary (p-value > 0.05)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$inflation_rate
## Dickey-Fuller = -3.2628, Lag order = 3, p-value = 0.08835
## alternative hypothesis: stationary
adf.test(coca3$unemp_rate) # non-stationary (p-value > 0.05)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$unemp_rate
## Dickey-Fuller = -2.2564, Lag order = 3, p-value = 0.4717
## alternative hypothesis: stationary
adf.test(coca3$gdp_percapita)# non-stationary (p-value > 0.05)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$gdp_percapita
## Dickey-Fuller = -2.881, Lag order = 3, p-value = 0.2223
## alternative hypothesis: stationary
adf.test(coca3$itaee)# stationary (p-value < 0.05)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$itaee
## Dickey-Fuller = -3.5209, Lag order = 3, p-value = 0.04927
## alternative hypothesis: stationary
adf.test(coca3$itaee_growth) # non stationary (p-value < 0.05)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$itaee_growth
## Dickey-Fuller = -2.8144, Lag order = 3, p-value = 0.2489
## alternative hypothesis: stationary
adf.test(coca3$pop_density)# non-stationary (p-value > 0.05)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$pop_density
## Dickey-Fuller = -0.56892, Lag order = 3, p-value = 0.9751
## alternative hypothesis: stationary
adf.test(coca3$job_density)# non-stationary (p-value > 0.05)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$job_density
## Dickey-Fuller = -1.6524, Lag order = 3, p-value = 0.713
## alternative hypothesis: stationary
adf.test(coca3$pop_minwage) # non-stationary (p-value > 0.05)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$pop_minwage
## Dickey-Fuller = -3.117, Lag order = 3, p-value = 0.128
## alternative hypothesis: stationary
adf.test(coca3$exchange_rate)# non-stationary (p-value > 0.05)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$exchange_rate
## Dickey-Fuller = -1.9944, Lag order = 3, p-value = 0.5764
## alternative hypothesis: stationary
adf.test(coca3$max_temperature) # stationary (p-value < 0.05)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$max_temperature
## Dickey-Fuller = -3.5429, Lag order = 3, p-value = 0.04747
## alternative hypothesis: stationary
adf.test(coca3$holiday_month)  # stationary (p-value < 0.05)
## Warning in adf.test(coca3$holiday_month): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$holiday_month
## Dickey-Fuller = -4.6219, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary
adf.test(coca3$sales_unitboxes)# stationary (p-value < 0.05)
## Warning in adf.test(coca3$sales_unitboxes): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  coca3$sales_unitboxes
## Dickey-Fuller = -4.4282, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary
var_tseries1<-cbind(sales_unitboxes,max_temperature,itaee_growth,pop_minwage,consumer_sentiment)
colnames(var_tseries1)<-cbind("sales_unitboxes","max_temperature","itaee_growth","pop_minwage","consumer_sentiment")
  • This line will automatically generate the preferred lag order based on the multivariate iterations of the AIC.The number of lags that will minimize our AIC statistics is 5, but we consider 5 lags to be too much, but we will choose the lag 1, since it has the second lowest AIC
lagselect1<-VARselect(var_tseries1,lag.max=5,type="const")
lagselect1$selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      5      1      1      1
lagselect1$criteria
##                   1            2            3            4            5
## AIC(n) 1.731302e+01 1.766279e+01 1.776954e+01 1.747702e+01 1.681127e+01
## HQ(n)  1.776614e+01 1.849351e+01 1.897787e+01 1.906295e+01 1.877480e+01
## SC(n)  1.854176e+01 1.991548e+01 2.104619e+01 2.177762e+01 2.213583e+01
## FPE(n) 3.333708e+07 4.966878e+07 6.290342e+07 6.137404e+07 5.212851e+07
  • We estimate the VAR model. The p option refers to the number of lags used. We see max_temperature, consumer_sentiment and population wage is statistically significant
var_model1<-VAR(var_tseries1,p=1,type="const",season=NULL,exog=NULL) 
summary(var_model1)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: sales_unitboxes, max_temperature, itaee_growth, pop_minwage, consumer_sentiment 
## Deterministic variables: const 
## Sample size: 47 
## Log Likelihood: -707.645 
## Roots of the characteristic polynomial:
## 0.8681 0.8681 0.6131 0.485 0.485
## Call:
## VAR(y = var_tseries1, p = 1, type = "const", exogen = NULL)
## 
## 
## Estimation results for equation sales_unitboxes: 
## ================================================ 
## sales_unitboxes = sales_unitboxes.l1 + max_temperature.l1 + itaee_growth.l1 + pop_minwage.l1 + consumer_sentiment.l1 + const 
## 
##                         Estimate Std. Error t value Pr(>|t|)    
## sales_unitboxes.l1     5.191e-02  1.422e-01   0.365   0.7170    
## max_temperature.l1     1.504e+05  3.146e+04   4.781 2.26e-05 ***
## itaee_growth.l1        1.293e+06  4.426e+06   0.292   0.7717    
## pop_minwage.l1         1.271e+05  6.494e+04   1.958   0.0571 .  
## consumer_sentiment.l1  4.265e+04  2.515e+04   1.696   0.0975 .  
## const                 -1.475e+06  1.432e+06  -1.030   0.3089    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 407000 on 41 degrees of freedom
## Multiple R-Squared: 0.5731,  Adjusted R-squared: 0.521 
## F-statistic: 11.01 on 5 and 41 DF,  p-value: 9.201e-07 
## 
## 
## Estimation results for equation max_temperature: 
## ================================================ 
## max_temperature = sales_unitboxes.l1 + max_temperature.l1 + itaee_growth.l1 + pop_minwage.l1 + consumer_sentiment.l1 + const 
## 
##                         Estimate Std. Error t value Pr(>|t|)    
## sales_unitboxes.l1    -7.331e-07  6.795e-07  -1.079   0.2869    
## max_temperature.l1     7.179e-01  1.503e-01   4.776  2.3e-05 ***
## itaee_growth.l1        2.455e+01  2.114e+01   1.161   0.2522    
## pop_minwage.l1         3.730e-01  3.103e-01   1.202   0.2361    
## consumer_sentiment.l1 -1.601e-01  1.201e-01  -1.333   0.1900    
## const                  1.433e+01  6.840e+00   2.095   0.0424 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 1.945 on 41 degrees of freedom
## Multiple R-Squared: 0.5149,  Adjusted R-squared: 0.4557 
## F-statistic: 8.703 on 5 and 41 DF,  p-value: 1.092e-05 
## 
## 
## Estimation results for equation itaee_growth: 
## ============================================= 
## itaee_growth = sales_unitboxes.l1 + max_temperature.l1 + itaee_growth.l1 + pop_minwage.l1 + consumer_sentiment.l1 + const 
## 
##                         Estimate Std. Error t value Pr(>|t|)    
## sales_unitboxes.l1     1.553e-09  4.304e-09   0.361 0.720145    
## max_temperature.l1    -2.605e-04  9.522e-04  -0.274 0.785779    
## itaee_growth.l1        5.609e-01  1.339e-01   4.188 0.000146 ***
## pop_minwage.l1        -1.290e-03  1.965e-03  -0.656 0.515397    
## consumer_sentiment.l1 -3.534e-04  7.611e-04  -0.464 0.644815    
## const                  3.876e-02  4.333e-02   0.894 0.376306    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.01232 on 41 degrees of freedom
## Multiple R-Squared: 0.3597,  Adjusted R-squared: 0.2816 
## F-statistic: 4.606 on 5 and 41 DF,  p-value: 0.001997 
## 
## 
## Estimation results for equation pop_minwage: 
## ============================================ 
## pop_minwage = sales_unitboxes.l1 + max_temperature.l1 + itaee_growth.l1 + pop_minwage.l1 + consumer_sentiment.l1 + const 
## 
##                         Estimate Std. Error t value Pr(>|t|)    
## sales_unitboxes.l1     2.832e-07  1.084e-07   2.611  0.01254 *  
## max_temperature.l1    -8.494e-02  2.399e-02  -3.541  0.00101 ** 
## itaee_growth.l1       -1.338e+00  3.374e+00  -0.397  0.69376    
## pop_minwage.l1         8.964e-01  4.951e-02  18.104  < 2e-16 ***
## consumer_sentiment.l1 -4.429e-02  1.917e-02  -2.310  0.02601 *  
## const                  3.640e+00  1.092e+00   3.335  0.00182 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.3103 on 41 degrees of freedom
## Multiple R-Squared: 0.9127,  Adjusted R-squared: 0.902 
## F-statistic: 85.68 on 5 and 41 DF,  p-value: < 2.2e-16 
## 
## 
## Estimation results for equation consumer_sentiment: 
## =================================================== 
## consumer_sentiment = sales_unitboxes.l1 + max_temperature.l1 + itaee_growth.l1 + pop_minwage.l1 + consumer_sentiment.l1 + const 
## 
##                         Estimate Std. Error t value Pr(>|t|)    
## sales_unitboxes.l1    -1.106e-06  5.480e-07  -2.017  0.05024 .  
## max_temperature.l1     3.367e-01  1.212e-01   2.777  0.00823 ** 
## itaee_growth.l1       -4.648e+00  1.705e+01  -0.273  0.78654    
## pop_minwage.l1         4.345e-01  2.502e-01   1.736  0.09001 .  
## consumer_sentiment.l1  1.004e+00  9.689e-02  10.365 5.07e-13 ***
## const                 -7.836e+00  5.516e+00  -1.420  0.16304    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 1.568 on 41 degrees of freedom
## Multiple R-Squared: 0.7424,  Adjusted R-squared: 0.711 
## F-statistic: 23.63 on 5 and 41 DF,  p-value: 4.202e-11 
## 
## 
## 
## Covariance matrix of residuals:
##                    sales_unitboxes max_temperature itaee_growth pop_minwage
## sales_unitboxes          1.657e+11       2.527e+05   -1.115e+03   1.010e+04
## max_temperature          2.527e+05       3.782e+00   -1.919e-03   4.525e-02
## itaee_growth            -1.115e+03      -1.919e-03    1.518e-04  -7.279e-04
## pop_minwage              1.010e+04       4.525e-02   -7.279e-04   9.631e-02
## consumer_sentiment       5.019e+04      -6.989e-01   -2.138e-03  -7.704e-02
##                    consumer_sentiment
## sales_unitboxes             5.019e+04
## max_temperature            -6.989e-01
## itaee_growth               -2.138e-03
## pop_minwage                -7.704e-02
## consumer_sentiment          2.460e+00
## 
## Correlation matrix of residuals:
##                    sales_unitboxes max_temperature itaee_growth pop_minwage
## sales_unitboxes            1.00000         0.31925     -0.22240     0.07997
## max_temperature            0.31925         1.00000     -0.08012     0.07497
## itaee_growth              -0.22240        -0.08012      1.00000    -0.19040
## pop_minwage                0.07997         0.07497     -0.19040     1.00000
## consumer_sentiment         0.07863        -0.22916     -0.11068    -0.15828
##                    consumer_sentiment
## sales_unitboxes               0.07863
## max_temperature              -0.22916
## itaee_growth                 -0.11068
## pop_minwage                  -0.15828
## consumer_sentiment            1.00000

Granger causality testing each variable against all the others.

granger_coca<-causality(var_model1,cause="sales_unitboxes")
granger_coca
## $Granger
## 
##  Granger causality H0: sales_unitboxes do not Granger-cause
##  max_temperature itaee_growth pop_minwage consumer_sentiment
## 
## data:  VAR object var_model1
## F-Test = 3.1251, df1 = 4, df2 = 205, p-value = 0.01597
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: sales_unitboxes and
##  max_temperature itaee_growth pop_minwage consumer_sentiment
## 
## data:  VAR object var_model1
## Chi-squared = 6.4797, df = 4, p-value = 0.1661

Transform non-stationary time series variables The number of lags that will minimize our AIC statistics is 2

diff_sales_unitboxes <- diff(log(sales_unitboxes))
diff_itaee_growth<-diff(log(itaee_growth))
diff_unemp_rate<-diff(log(unemp_rate))
diff_consumer_sentiment<-diff(log(consumer_sentiment))
diff_max_temperature <- diff(log(max_temperature))

var_tseries2<-cbind(diff_sales_unitboxes, diff_itaee_growth,diff_unemp_rate,diff_consumer_sentiment,diff_max_temperature)

colnames(var_tseries2)<-cbind("sales_unit_boxes", "itaee_growth","unemp_rate","consumer_sentiment","max_temperature")

lagselect2<-VARselect(var_tseries2,lag.max=5,type="const")
lagselect2$selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      5      1      1      5
lagselect2$criteria
##                    1             2             3             4             5
## AIC(n) -2.195882e+01 -2.147906e+01 -2.182207e+01 -2.182334e+01 -2.310243e+01
## HQ(n)  -2.150388e+01 -2.064499e+01 -2.060888e+01 -2.023103e+01 -2.113100e+01
## SC(n)  -2.071763e+01 -1.920354e+01 -1.851223e+01 -1.747917e+01 -1.772393e+01
## FPE(n)  2.935464e-10  4.999133e-10  4.079544e-10  5.449744e-10  2.631452e-10

Specify model

var_model2<-VAR(var_tseries2,p=2,type="const",season=NULL,exog=NULL) 
summary(var_model2)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: sales_unit_boxes, itaee_growth, unemp_rate, consumer_sentiment, max_temperature 
## Deterministic variables: const 
## Sample size: 45 
## Log Likelihood: 226.356 
## Roots of the characteristic polynomial:
## 0.6449 0.6449 0.6279 0.6279 0.5687 0.5687 0.4648 0.4648 0.3648 0.3648
## Call:
## VAR(y = var_tseries2, p = 2, type = "const", exogen = NULL)
## 
## 
## Estimation results for equation sales_unit_boxes: 
## ================================================= 
## sales_unit_boxes = sales_unit_boxes.l1 + itaee_growth.l1 + unemp_rate.l1 + consumer_sentiment.l1 + max_temperature.l1 + sales_unit_boxes.l2 + itaee_growth.l2 + unemp_rate.l2 + consumer_sentiment.l2 + max_temperature.l2 + const 
## 
##                        Estimate Std. Error t value Pr(>|t|)   
## sales_unit_boxes.l1   -0.563053   0.177327  -3.175  0.00318 **
## itaee_growth.l1        0.002355   0.020010   0.118  0.90701   
## unemp_rate.l1          0.040559   0.149473   0.271  0.78777   
## consumer_sentiment.l1  0.678038   0.279016   2.430  0.02052 * 
## max_temperature.l1     0.675570   0.200245   3.374  0.00187 **
## sales_unit_boxes.l2   -0.195086   0.165309  -1.180  0.24614   
## itaee_growth.l2       -0.002210   0.020537  -0.108  0.91492   
## unemp_rate.l2         -0.182797   0.143861  -1.271  0.21248   
## consumer_sentiment.l2 -0.190435   0.302412  -0.630  0.53309   
## max_temperature.l2     0.443002   0.208280   2.127  0.04076 * 
## const                  0.003901   0.011480   0.340  0.73607   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.07556 on 34 degrees of freedom
## Multiple R-Squared: 0.4546,  Adjusted R-squared: 0.2942 
## F-statistic: 2.834 on 10 and 34 DF,  p-value: 0.01132 
## 
## 
## Estimation results for equation itaee_growth: 
## ============================================= 
## itaee_growth = sales_unit_boxes.l1 + itaee_growth.l1 + unemp_rate.l1 + consumer_sentiment.l1 + max_temperature.l1 + sales_unit_boxes.l2 + itaee_growth.l2 + unemp_rate.l2 + consumer_sentiment.l2 + max_temperature.l2 + const 
## 
##                        Estimate Std. Error t value Pr(>|t|)
## sales_unit_boxes.l1    1.692690   1.529816   1.106    0.276
## itaee_growth.l1       -0.006317   0.172632  -0.037    0.971
## unemp_rate.l1          0.849594   1.289514   0.659    0.514
## consumer_sentiment.l1 -2.576313   2.407091  -1.070    0.292
## max_temperature.l1    -2.075133   1.727528  -1.201    0.238
## sales_unit_boxes.l2    0.615984   1.426133   0.432    0.669
## itaee_growth.l2       -0.036686   0.177173  -0.207    0.837
## unemp_rate.l2          0.154218   1.241101   0.124    0.902
## consumer_sentiment.l2 -3.120630   2.608926  -1.196    0.240
## max_temperature.l2    -0.545550   1.796842  -0.304    0.763
## const                 -0.008557   0.099042  -0.086    0.932
## 
## 
## Residual standard error: 0.6519 on 34 degrees of freedom
## Multiple R-Squared: 0.08297, Adjusted R-squared: -0.1867 
## F-statistic: 0.3076 on 10 and 34 DF,  p-value: 0.974 
## 
## 
## Estimation results for equation unemp_rate: 
## =========================================== 
## unemp_rate = sales_unit_boxes.l1 + itaee_growth.l1 + unemp_rate.l1 + consumer_sentiment.l1 + max_temperature.l1 + sales_unit_boxes.l2 + itaee_growth.l2 + unemp_rate.l2 + consumer_sentiment.l2 + max_temperature.l2 + const 
## 
##                        Estimate Std. Error t value Pr(>|t|)   
## sales_unit_boxes.l1    0.268492   0.189143   1.420  0.16485   
## itaee_growth.l1        0.031321   0.021344   1.467  0.15145   
## unemp_rate.l1         -0.514789   0.159433  -3.229  0.00275 **
## consumer_sentiment.l1  0.605996   0.297607   2.036  0.04958 * 
## max_temperature.l1    -0.046973   0.213588  -0.220  0.82725   
## sales_unit_boxes.l2    0.318124   0.176324   1.804  0.08007 . 
## itaee_growth.l2        0.006572   0.021905   0.300  0.76600   
## unemp_rate.l2         -0.365214   0.153447  -2.380  0.02306 * 
## consumer_sentiment.l2 -0.045120   0.322562  -0.140  0.88958   
## max_temperature.l2    -0.347949   0.222157  -1.566  0.12656   
## const                 -0.011659   0.012245  -0.952  0.34776   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.08059 on 34 degrees of freedom
## Multiple R-Squared: 0.3981,  Adjusted R-squared: 0.221 
## F-statistic: 2.249 on 10 and 34 DF,  p-value: 0.03839 
## 
## 
## Estimation results for equation consumer_sentiment: 
## =================================================== 
## consumer_sentiment = sales_unit_boxes.l1 + itaee_growth.l1 + unemp_rate.l1 + consumer_sentiment.l1 + max_temperature.l1 + sales_unit_boxes.l2 + itaee_growth.l2 + unemp_rate.l2 + consumer_sentiment.l2 + max_temperature.l2 + const 
## 
##                         Estimate Std. Error t value Pr(>|t|)  
## sales_unit_boxes.l1   -0.2018097  0.1122488  -1.798   0.0811 .
## itaee_growth.l1       -0.0101777  0.0126667  -0.803   0.4273  
## unemp_rate.l1          0.1005459  0.0946168   1.063   0.2954  
## consumer_sentiment.l1 -0.0126064  0.1766179  -0.071   0.9435  
## max_temperature.l1     0.1862236  0.1267557   1.469   0.1510  
## sales_unit_boxes.l2   -0.0026797  0.1046411  -0.026   0.9797  
## itaee_growth.l2        0.0004688  0.0129999   0.036   0.9714  
## unemp_rate.l2         -0.0909286  0.0910646  -0.999   0.3251  
## consumer_sentiment.l2  0.0090747  0.1914274   0.047   0.9625  
## max_temperature.l2     0.2573765  0.1318415   1.952   0.0592 .
## const                  0.0042831  0.0072671   0.589   0.5595  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.04783 on 34 degrees of freedom
## Multiple R-Squared: 0.2566,  Adjusted R-squared: 0.03792 
## F-statistic: 1.173 on 10 and 34 DF,  p-value: 0.3417 
## 
## 
## Estimation results for equation max_temperature: 
## ================================================ 
## max_temperature = sales_unit_boxes.l1 + itaee_growth.l1 + unemp_rate.l1 + consumer_sentiment.l1 + max_temperature.l1 + sales_unit_boxes.l2 + itaee_growth.l2 + unemp_rate.l2 + consumer_sentiment.l2 + max_temperature.l2 + const 
## 
##                        Estimate Std. Error t value Pr(>|t|)
## sales_unit_boxes.l1    0.085163   0.170127   0.501    0.620
## itaee_growth.l1        0.011969   0.019198   0.623    0.537
## unemp_rate.l1          0.098539   0.143404   0.687    0.497
## consumer_sentiment.l1 -0.064109   0.267687  -0.239    0.812
## max_temperature.l1     0.137291   0.192114   0.715    0.480
## sales_unit_boxes.l2   -0.012406   0.158597  -0.078    0.938
## itaee_growth.l2        0.014476   0.019703   0.735    0.468
## unemp_rate.l2         -0.109129   0.138020  -0.791    0.435
## consumer_sentiment.l2 -0.264774   0.290132  -0.913    0.368
## max_temperature.l2    -0.159065   0.199822  -0.796    0.432
## const                 -0.001181   0.011014  -0.107    0.915
## 
## 
## Residual standard error: 0.07249 on 34 degrees of freedom
## Multiple R-Squared: 0.1656,  Adjusted R-squared: -0.07978 
## F-statistic: 0.6749 on 10 and 34 DF,  p-value: 0.7395 
## 
## 
## 
## Covariance matrix of residuals:
##                    sales_unit_boxes itaee_growth unemp_rate consumer_sentiment
## sales_unit_boxes           0.005709    -0.013349  0.0016036          0.0004540
## itaee_growth              -0.013349     0.424914 -0.0034855         -0.0024233
## unemp_rate                 0.001604    -0.003485  0.0064954          0.0004427
## consumer_sentiment         0.000454    -0.002423  0.0004427          0.0022876
## max_temperature            0.002068    -0.008945  0.0009411         -0.0008576
##                    max_temperature
## sales_unit_boxes         0.0020682
## itaee_growth            -0.0089447
## unemp_rate               0.0009411
## consumer_sentiment      -0.0008576
## max_temperature          0.0052550
## 
## Correlation matrix of residuals:
##                    sales_unit_boxes itaee_growth unemp_rate consumer_sentiment
## sales_unit_boxes             1.0000     -0.27103    0.26334            0.12563
## itaee_growth                -0.2710      1.00000   -0.06635           -0.07772
## unemp_rate                   0.2633     -0.06635    1.00000            0.11484
## consumer_sentiment           0.1256     -0.07772    0.11484            1.00000
## max_temperature              0.3776     -0.18929    0.16109           -0.24734
##                    max_temperature
## sales_unit_boxes            0.3776
## itaee_growth               -0.1893
## unemp_rate                  0.1611
## consumer_sentiment         -0.2473
## max_temperature             1.0000
  • sales_unit_boxes do not Granger-cause itaee_growth, unemp_rate, consumer_sentiment and max_temperature, so it’s not a bidirectional relationship
granger_coca1<-causality(var_model2,cause="sales_unit_boxes")
granger_coca1
## $Granger
## 
##  Granger causality H0: sales_unit_boxes do not Granger-cause
##  itaee_growth unemp_rate consumer_sentiment max_temperature
## 
## data:  VAR object var_model2
## F-Test = 1.1727, df1 = 8, df2 = 170, p-value = 0.3183
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: sales_unit_boxes and
##  itaee_growth unemp_rate consumer_sentiment max_temperature
## 
## data:  VAR object var_model2
## Chi-squared = 9.0746, df = 4, p-value = 0.05926

7 Select the regression model that better fits and model diagnostics

For the selection of the best model we will take in consideration the akaike information criterion and some model diagnostics like the L-Jung Box Test, R squared and the number of statistically significant variables but only when the diagnostics apply to the model.

Firstly we will compare L-Jung Box Test for the model of ARMA and ARIMA. ARMA model: 0.9991 ARIMA model: 0.8713 In both of our models fail to reject our null hypothesis , since the p-value is <0.05, concluding that our model does not show lack of fit

Box.test(ARMA.residuals,lag=1,type="Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  ARMA.residuals
## X-squared = 1.2684e-06, df = 1, p-value = 0.9991
Box.test(ARIMA.mydatar$residuals,lag=1,type="Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  ARIMA.mydatar$residuals
## X-squared = 0.026233, df = 1, p-value = 0.8713

Here we will be evaluating the results of the akaike information criterion (AIC) for each of our models Arma, Arima and Var (with and without logarithm):

  • Model 1 ARMA(1,1): 1400.62

  • Model 2 ARIMA (1,1,1) : -97.31

  • Model 3 VAR(no log) : 1.73e+01

  • Model 3.1 VAR(log): -2.14e+01

  • The model with lowest AIC is model VAR 3.1 that includes a logarithmic function, however to choose the model that fits the best we will take in consideration the R squared

R SQUARED

MODEL 3 VAR: 29.4%

MODEL 3.1 VAR: 52.1%

Taking this in consideration and that bot hav statistically signifcant variables that affect sales over the period of time we will choose Model 3.1 VAR (with log) as the best model. Because the r squared is bigger that means the variance for a dependent variable that’s explained by an independent variable.

7.1 Interpret the time series regression of analysis of Model 3.1 VAR

7.1.1 Time Series Plots

Here we have our first plots where we can see a plot for ach one of the independent and dependent variable and it’s behavior over a certain period of time (2015-2018). The variables where we can observe trends and th data is non stationary are:

  • Consumer Sentiment
  • CPI
  • Inflation Rate
  • Unemployment Rate
  • GDP per capita
  • itaee growth
  • population and job density
  • minimum wage
  • exchange rate

and the variables where we see a constant mean over a period of time and stationary time series data are variables:

  • Itaee
  • Maximum Temperature
  • Holiday Month
  • Sales unit Boxes

The most compelling plots from our variables are :

  • consumer sentiment, because we see a constant trend but a low peak around 2017
  • Inflation rate, a non stationary variable where we see the lowest and highest peaks around 2017 -Unemployment Rate, also a non stationary variable where it tends to decrease over the period of time
  • max temperature,a stationary variable with a constant mean over a period of time and we see a pattern of high and then low peaks.
par(mfrow=c(3,3))
plot(coca3$date,coca3$consumer_sentiment,type="l",col="blue",lwd=2,xlab="Date",ylab="consumer_sentiment",main="consumer_sentiment")
plot(coca3$date,coca3$CPI,type="l",col="blue",lwd=2,xlab="Date",ylab="CPI",main="CPI Rate")
plot(coca3$date,coca3$inflation_rate,type="l",col="blue",lwd=2,xlab="Date",ylab="Inflation",main="Inflation Rate")
plot(coca3$date,coca3$unemp_rate,type="l",col="blue",lwd=2,xlab="Date",ylab="unemp_rate",main="unemp_rate")
plot(coca3$date,coca3$gdp_percapita,type="l",col="blue",lwd=2,xlab="Date",ylab="gdp_percapita",main="gdp_percapita")
plot(coca3$date,coca3$itaee,type="l",col="blue",lwd=2,xlab="Date",ylab="itaee",main="itaee")
plot(coca3$date,coca3$gdp_percapita,type="l",col="blue",lwd=2,xlab="Date",ylab="gdp_percapita",main="gdp_percapita")
plot(coca3$date,coca3$itaee_growth,type="l",col="blue",lwd=2,xlab="Date",ylab="itaee_growth",main="itaee_growth")
plot(coca3$date,coca3$pop_density,type="l",col="blue",lwd=2,xlab="Date",ylab="pop_density",main="pop_density")

plot(coca3$date,coca3$job_density,type="l",col="blue",lwd=2,xlab="Date",ylab="job_density",main="job_density")
plot(coca3$date,coca3$pop_minwage,type="l",col="blue",lwd=2,xlab="Date",ylab="pop_minwage",main="pop_minwage")
plot(coca3$date,coca3$exchange_rate,type="l",col="blue",lwd=2,xlab="Date",ylab="exchange_rate",main="exchange_rate")
plot(coca3$date,coca3$max_temperature,type="l",col="blue",lwd=2,xlab="Date",ylab="max_temperature",main="max_temperature")
plot(coca3$date,coca3$holiday_month,type="l",col="blue",lwd=2,xlab="Date",ylab="holiday_month",main="holiday_month")
plot(coca3$date,coca3$sales_unitboxes,type="l",col="blue",lwd=2,xlab="Date",ylab="sales_unitboxes",main="sales_unitboxes")

7.1.2 Alternative plots

Here we have a more specific graph, to observe the behavior of the sales from 2015 to 2018, we can clearly observe a pattern seasonally for this stationary component, most of the low peaks from the sales unit boxes are around the beginning of the year, meanwhile the highest peaks are around the half of the year.

ts_plot(sales_unitboxes)

It is important to assess whether the variables under study are stationary or not, as we mention earlier we only have 4 stationary variables while the other 10 are non stationary.

For our model we chose 5 variables: - sales_unitboxes - max_temperature - itaee_growth - pop_minwage - consumer_sentiment

and we chose this variables because of the patterns we saw in our earlier plots, since they were the most compelling at first sight.Then we did and adf. test in order to asses which were our stationary and non stationary data In order to have all our variables stationary we used added a logarithmic function in order that the statistical properties of the system do not change over time. With our results we saw that the lags we would use for our model would be 2, since we consider 5 lags to be to big to analyze.

lagselect2<-VARselect(var_tseries2,lag.max=5,type="const")
lagselect2$selection
## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      5      1      1      5
lagselect2$criteria
##                    1             2             3             4             5
## AIC(n) -2.195882e+01 -2.147906e+01 -2.182207e+01 -2.182334e+01 -2.310243e+01
## HQ(n)  -2.150388e+01 -2.064499e+01 -2.060888e+01 -2.023103e+01 -2.113100e+01
## SC(n)  -2.071763e+01 -1.920354e+01 -1.851223e+01 -1.747917e+01 -1.772393e+01
## FPE(n)  2.935464e-10  4.999133e-10  4.079544e-10  5.449744e-10  2.631452e-10
  • The statistically significant variable are maximum temperature in both periods and consumer sentiment in one period
var_model2<-VAR(var_tseries2,p=2,type="const",season=NULL,exog=NULL) 
summary(var_model2)
## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: sales_unit_boxes, itaee_growth, unemp_rate, consumer_sentiment, max_temperature 
## Deterministic variables: const 
## Sample size: 45 
## Log Likelihood: 226.356 
## Roots of the characteristic polynomial:
## 0.6449 0.6449 0.6279 0.6279 0.5687 0.5687 0.4648 0.4648 0.3648 0.3648
## Call:
## VAR(y = var_tseries2, p = 2, type = "const", exogen = NULL)
## 
## 
## Estimation results for equation sales_unit_boxes: 
## ================================================= 
## sales_unit_boxes = sales_unit_boxes.l1 + itaee_growth.l1 + unemp_rate.l1 + consumer_sentiment.l1 + max_temperature.l1 + sales_unit_boxes.l2 + itaee_growth.l2 + unemp_rate.l2 + consumer_sentiment.l2 + max_temperature.l2 + const 
## 
##                        Estimate Std. Error t value Pr(>|t|)   
## sales_unit_boxes.l1   -0.563053   0.177327  -3.175  0.00318 **
## itaee_growth.l1        0.002355   0.020010   0.118  0.90701   
## unemp_rate.l1          0.040559   0.149473   0.271  0.78777   
## consumer_sentiment.l1  0.678038   0.279016   2.430  0.02052 * 
## max_temperature.l1     0.675570   0.200245   3.374  0.00187 **
## sales_unit_boxes.l2   -0.195086   0.165309  -1.180  0.24614   
## itaee_growth.l2       -0.002210   0.020537  -0.108  0.91492   
## unemp_rate.l2         -0.182797   0.143861  -1.271  0.21248   
## consumer_sentiment.l2 -0.190435   0.302412  -0.630  0.53309   
## max_temperature.l2     0.443002   0.208280   2.127  0.04076 * 
## const                  0.003901   0.011480   0.340  0.73607   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.07556 on 34 degrees of freedom
## Multiple R-Squared: 0.4546,  Adjusted R-squared: 0.2942 
## F-statistic: 2.834 on 10 and 34 DF,  p-value: 0.01132 
## 
## 
## Estimation results for equation itaee_growth: 
## ============================================= 
## itaee_growth = sales_unit_boxes.l1 + itaee_growth.l1 + unemp_rate.l1 + consumer_sentiment.l1 + max_temperature.l1 + sales_unit_boxes.l2 + itaee_growth.l2 + unemp_rate.l2 + consumer_sentiment.l2 + max_temperature.l2 + const 
## 
##                        Estimate Std. Error t value Pr(>|t|)
## sales_unit_boxes.l1    1.692690   1.529816   1.106    0.276
## itaee_growth.l1       -0.006317   0.172632  -0.037    0.971
## unemp_rate.l1          0.849594   1.289514   0.659    0.514
## consumer_sentiment.l1 -2.576313   2.407091  -1.070    0.292
## max_temperature.l1    -2.075133   1.727528  -1.201    0.238
## sales_unit_boxes.l2    0.615984   1.426133   0.432    0.669
## itaee_growth.l2       -0.036686   0.177173  -0.207    0.837
## unemp_rate.l2          0.154218   1.241101   0.124    0.902
## consumer_sentiment.l2 -3.120630   2.608926  -1.196    0.240
## max_temperature.l2    -0.545550   1.796842  -0.304    0.763
## const                 -0.008557   0.099042  -0.086    0.932
## 
## 
## Residual standard error: 0.6519 on 34 degrees of freedom
## Multiple R-Squared: 0.08297, Adjusted R-squared: -0.1867 
## F-statistic: 0.3076 on 10 and 34 DF,  p-value: 0.974 
## 
## 
## Estimation results for equation unemp_rate: 
## =========================================== 
## unemp_rate = sales_unit_boxes.l1 + itaee_growth.l1 + unemp_rate.l1 + consumer_sentiment.l1 + max_temperature.l1 + sales_unit_boxes.l2 + itaee_growth.l2 + unemp_rate.l2 + consumer_sentiment.l2 + max_temperature.l2 + const 
## 
##                        Estimate Std. Error t value Pr(>|t|)   
## sales_unit_boxes.l1    0.268492   0.189143   1.420  0.16485   
## itaee_growth.l1        0.031321   0.021344   1.467  0.15145   
## unemp_rate.l1         -0.514789   0.159433  -3.229  0.00275 **
## consumer_sentiment.l1  0.605996   0.297607   2.036  0.04958 * 
## max_temperature.l1    -0.046973   0.213588  -0.220  0.82725   
## sales_unit_boxes.l2    0.318124   0.176324   1.804  0.08007 . 
## itaee_growth.l2        0.006572   0.021905   0.300  0.76600   
## unemp_rate.l2         -0.365214   0.153447  -2.380  0.02306 * 
## consumer_sentiment.l2 -0.045120   0.322562  -0.140  0.88958   
## max_temperature.l2    -0.347949   0.222157  -1.566  0.12656   
## const                 -0.011659   0.012245  -0.952  0.34776   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.08059 on 34 degrees of freedom
## Multiple R-Squared: 0.3981,  Adjusted R-squared: 0.221 
## F-statistic: 2.249 on 10 and 34 DF,  p-value: 0.03839 
## 
## 
## Estimation results for equation consumer_sentiment: 
## =================================================== 
## consumer_sentiment = sales_unit_boxes.l1 + itaee_growth.l1 + unemp_rate.l1 + consumer_sentiment.l1 + max_temperature.l1 + sales_unit_boxes.l2 + itaee_growth.l2 + unemp_rate.l2 + consumer_sentiment.l2 + max_temperature.l2 + const 
## 
##                         Estimate Std. Error t value Pr(>|t|)  
## sales_unit_boxes.l1   -0.2018097  0.1122488  -1.798   0.0811 .
## itaee_growth.l1       -0.0101777  0.0126667  -0.803   0.4273  
## unemp_rate.l1          0.1005459  0.0946168   1.063   0.2954  
## consumer_sentiment.l1 -0.0126064  0.1766179  -0.071   0.9435  
## max_temperature.l1     0.1862236  0.1267557   1.469   0.1510  
## sales_unit_boxes.l2   -0.0026797  0.1046411  -0.026   0.9797  
## itaee_growth.l2        0.0004688  0.0129999   0.036   0.9714  
## unemp_rate.l2         -0.0909286  0.0910646  -0.999   0.3251  
## consumer_sentiment.l2  0.0090747  0.1914274   0.047   0.9625  
## max_temperature.l2     0.2573765  0.1318415   1.952   0.0592 .
## const                  0.0042831  0.0072671   0.589   0.5595  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.04783 on 34 degrees of freedom
## Multiple R-Squared: 0.2566,  Adjusted R-squared: 0.03792 
## F-statistic: 1.173 on 10 and 34 DF,  p-value: 0.3417 
## 
## 
## Estimation results for equation max_temperature: 
## ================================================ 
## max_temperature = sales_unit_boxes.l1 + itaee_growth.l1 + unemp_rate.l1 + consumer_sentiment.l1 + max_temperature.l1 + sales_unit_boxes.l2 + itaee_growth.l2 + unemp_rate.l2 + consumer_sentiment.l2 + max_temperature.l2 + const 
## 
##                        Estimate Std. Error t value Pr(>|t|)
## sales_unit_boxes.l1    0.085163   0.170127   0.501    0.620
## itaee_growth.l1        0.011969   0.019198   0.623    0.537
## unemp_rate.l1          0.098539   0.143404   0.687    0.497
## consumer_sentiment.l1 -0.064109   0.267687  -0.239    0.812
## max_temperature.l1     0.137291   0.192114   0.715    0.480
## sales_unit_boxes.l2   -0.012406   0.158597  -0.078    0.938
## itaee_growth.l2        0.014476   0.019703   0.735    0.468
## unemp_rate.l2         -0.109129   0.138020  -0.791    0.435
## consumer_sentiment.l2 -0.264774   0.290132  -0.913    0.368
## max_temperature.l2    -0.159065   0.199822  -0.796    0.432
## const                 -0.001181   0.011014  -0.107    0.915
## 
## 
## Residual standard error: 0.07249 on 34 degrees of freedom
## Multiple R-Squared: 0.1656,  Adjusted R-squared: -0.07978 
## F-statistic: 0.6749 on 10 and 34 DF,  p-value: 0.7395 
## 
## 
## 
## Covariance matrix of residuals:
##                    sales_unit_boxes itaee_growth unemp_rate consumer_sentiment
## sales_unit_boxes           0.005709    -0.013349  0.0016036          0.0004540
## itaee_growth              -0.013349     0.424914 -0.0034855         -0.0024233
## unemp_rate                 0.001604    -0.003485  0.0064954          0.0004427
## consumer_sentiment         0.000454    -0.002423  0.0004427          0.0022876
## max_temperature            0.002068    -0.008945  0.0009411         -0.0008576
##                    max_temperature
## sales_unit_boxes         0.0020682
## itaee_growth            -0.0089447
## unemp_rate               0.0009411
## consumer_sentiment      -0.0008576
## max_temperature          0.0052550
## 
## Correlation matrix of residuals:
##                    sales_unit_boxes itaee_growth unemp_rate consumer_sentiment
## sales_unit_boxes             1.0000     -0.27103    0.26334            0.12563
## itaee_growth                -0.2710      1.00000   -0.06635           -0.07772
## unemp_rate                   0.2633     -0.06635    1.00000            0.11484
## consumer_sentiment           0.1256     -0.07772    0.11484            1.00000
## max_temperature              0.3776     -0.18929    0.16109           -0.24734
##                    max_temperature
## sales_unit_boxes            0.3776
## itaee_growth               -0.1893
## unemp_rate                  0.1611
## consumer_sentiment         -0.2473
## max_temperature             1.0000
  • sales_unit_boxes do not Granger-cause itaee_growth, unemp_rate, consumer_sentiment and max_temperature, so it’s not a bidirectional relationship
granger_coca1<-causality(var_model2,cause="sales_unit_boxes")
granger_coca1
## $Granger
## 
##  Granger causality H0: sales_unit_boxes do not Granger-cause
##  itaee_growth unemp_rate consumer_sentiment max_temperature
## 
## data:  VAR object var_model2
## F-Test = 1.1727, df1 = 8, df2 = 170, p-value = 0.3183
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: sales_unit_boxes and
##  itaee_growth unemp_rate consumer_sentiment max_temperature
## 
## data:  VAR object var_model2
## Chi-squared = 9.0746, df = 4, p-value = 0.05926
  • There could be a unidirectional, bidirectional, or no causality relationships between variables but our granger test shows thatsales_unitboxes do not Granger-cause the independent variables of max_temperature itaee_growth, pop_minwage, consumer_sentiment.

7.2 Forecast of the dependent variable

  • Finally for our result analysis we have the forecasting for the next year of twelve months for the sales unit boxes from our Vector Auto regression Model. Where in our graphic represents the grey output, we see a pattern where the sales will increase an then decrease but stabilize by the end of the year.

  • However in our chart we can conclude we will expect the biggest number of sales for March and the lowest for February.

  • Most of the negative impact on sales are around the beginning and the middle of the year

  • The unit sales boxes tend to stabilize by the end of the year, out of 12 months of the year 5 will have a negative expectation for sales

forecast<-predict(var_model2,n.ahead=12,ci=0.95) ### forecast for the next year
fanchart(forecast,names="sales_unit_boxes",main="Sales unit boxes",xlab="Time Period",ylab="sales")

forecast
## $sales_unit_boxes
##                fcst      lower     upper        CI
##  [1,] -0.0254134123 -0.1735067 0.1226798 0.1480933
##  [2,] -0.0698473457 -0.2481691 0.1084744 0.1783217
##  [3,]  0.0263030809 -0.1686402 0.2212463 0.1949433
##  [4,]  0.0081452678 -0.1908338 0.2071243 0.1989791
##  [5,] -0.0062007142 -0.2069735 0.1945721 0.2007728
##  [6,]  0.0136862304 -0.1874639 0.2148363 0.2011501
##  [7,] -0.0014109828 -0.2028281 0.2000061 0.2014171
##  [8,] -0.0001866577 -0.2017088 0.2013355 0.2015222
##  [9,]  0.0038125135 -0.1977626 0.2053876 0.2015751
## [10,]  0.0003678387 -0.2012161 0.2019518 0.2015839
## [11,]  0.0020811909 -0.1995076 0.2036699 0.2015887
## [12,]  0.0024052560 -0.1991866 0.2039971 0.2015919
## 
## $itaee_growth
##               fcst     lower    upper       CI
##  [1,]  0.190528304 -1.087083 1.468139 1.277611
##  [2,] -0.154844316 -1.477634 1.167945 1.322790
##  [3,] -0.006394354 -1.339138 1.326349 1.332744
##  [4,] -0.007579971 -1.341272 1.326113 1.333693
##  [5,] -0.004343960 -1.338950 1.330262 1.334606
##  [6,] -0.006528384 -1.341883 1.328826 1.335354
##  [7,] -0.037560637 -1.373002 1.297881 1.335441
##  [8,] -0.013161167 -1.348867 1.322545 1.335706
##  [9,] -0.016633990 -1.352372 1.319104 1.335738
## [10,] -0.022969441 -1.358738 1.312800 1.335769
## [11,] -0.016838989 -1.352618 1.318940 1.335779
## [12,] -0.017657425 -1.353438 1.318123 1.335780
## 
## $unemp_rate
##                fcst      lower     upper        CI
##  [1,] -0.0141177180 -0.1720786 0.1438432 0.1579609
##  [2,] -0.0054375867 -0.1933223 0.1824471 0.1878847
##  [3,] -0.0448636810 -0.2352509 0.1455236 0.1903873
##  [4,]  0.0153818374 -0.1829245 0.2136881 0.1983063
##  [5,]  0.0002906435 -0.1993930 0.1999743 0.1996836
##  [6,] -0.0120620725 -0.2131676 0.1890434 0.2011055
##  [7,] -0.0020407195 -0.2034432 0.1993618 0.2014025
##  [8,] -0.0025407442 -0.2040421 0.1989606 0.2015013
##  [9,] -0.0063143076 -0.2078630 0.1952344 0.2015487
## [10,] -0.0046910778 -0.2062508 0.1968686 0.2015597
## [11,] -0.0043937429 -0.2059582 0.1971707 0.2015644
## [12,] -0.0045825815 -0.2061480 0.1969829 0.2015654
## 
## $consumer_sentiment
##               fcst       lower      upper         CI
##  [1,] -0.017940867 -0.11168433 0.07580259 0.09374346
##  [2,] -0.020359756 -0.12064282 0.07992330 0.10028306
##  [3,]  0.017384947 -0.08971254 0.12448243 0.10709748
##  [4,] -0.010596425 -0.11855438 0.09736153 0.10795796
##  [5,]  0.008701144 -0.09987355 0.11727584 0.10857470
##  [6,]  0.005137413 -0.10386716 0.11414199 0.10900457
##  [7,]  0.000641724 -0.10851596 0.10979941 0.10915769
##  [8,]  0.004606393 -0.10458598 0.11379877 0.10919237
##  [9,]  0.002666408 -0.10653915 0.11187197 0.10920556
## [10,]  0.002216835 -0.10699598 0.11142964 0.10921281
## [11,]  0.003474322 -0.10574306 0.11269170 0.10921738
## [12,]  0.002807770 -0.10641056 0.11202610 0.10921833
## 
## $max_temperature
##                fcst      lower     upper        CI
##  [1,]  0.0003296785 -0.1417503 0.1424096 0.1420799
##  [2,] -0.0195629344 -0.1658784 0.1267525 0.1463154
##  [3,] -0.0015873278 -0.1547730 0.1515983 0.1531857
##  [4,]  0.0029501463 -0.1516793 0.1575796 0.1546294
##  [5,]  0.0021484291 -0.1527186 0.1570155 0.1548670
##  [6,] -0.0015482753 -0.1565762 0.1534797 0.1550279
##  [7,] -0.0044873516 -0.1595464 0.1505717 0.1550590
##  [8,] -0.0026710166 -0.1577834 0.1524413 0.1551124
##  [9,] -0.0020264716 -0.1571475 0.1530946 0.1551210
## [10,] -0.0028325150 -0.1579596 0.1522945 0.1551271
## [11,] -0.0024005432 -0.1575290 0.1527279 0.1551285
## [12,] -0.0021521315 -0.1572810 0.1529768 0.1551289

8 Conclusions and Recommendations

So far the key insights and that we have from our analysis and forecasting is:

  • The variables maximum temperature and consumer sentiment seem to have an impact on our dependent variable sales of unit boxes, however consumer sentiment has an impact only in a certain period.

  • We can identify a pattern of variance in sales at the beginning of the year, where we have the biggest positive(March) impact as well as negative(February) impact on sales, by the end of the year the sales seem to stabilize.

Taking these observations, we can conclude consumer sentiment might be the variable impacting the most on sales the first part of year, as it shows in our results for lag 1, as well maximum temperature impacts most of the year, as we know consumer sentiment represents how the consumer is feeling in terms of their finances and the state of economy. So the recommendation would be take in consideration this variable at the months where we will expect a negative impact on sales (January,February, May, July and August ), to make special offers and discounts that can be accessible and cheaper for our consumer inside supermarkets and convenience stores, in order to make the consumer feel like he doesn’t have to spend much money on it’s favorite products.

9 Appendix